========================================================
Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.
Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : chr "1021339766868145413AB3B" "10273602499503308B223C1" "0EE9337825851032864889A" "0EF5356002482715299901A" ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : chr "2007-08-26 19:09:29.263000000" "2014-02-27 08:28:07.900000000" "2007-01-05 15:00:47.090000000" "2012-10-22 11:02:35.010000000" ...
## $ CreditGrade : chr "C" "" "HR" "" ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : chr "Completed" "Current" "Completed" "Current" ...
## $ ClosedDate : chr "2009-08-14 00:00:00" "" "2009-12-17 00:00:00" "" ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : chr "" "A" "" "A" ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : chr "CO" "CO" "GA" "GA" ...
## $ Occupation : chr "Other" "Professional" "Other" "Skilled Labor" ...
## $ EmploymentStatus : chr "Self-employed" "Employed" "Not available" "Employed" ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : chr "True" "False" "False" "True" ...
## $ CurrentlyInGroup : chr "True" "False" "True" "False" ...
## $ GroupKey : chr "" "" "783C3371218786870A73D20" "" ...
## $ DateCreditPulled : chr "2007-08-26 18:41:46.780000000" "2014-02-27 08:28:14" "2007-01-02 14:09:10.060000000" "2012-10-22 11:02:32" ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : chr "2001-10-11 00:00:00" "1996-03-18 00:00:00" "2002-07-27 00:00:00" "1983-02-28 00:00:00" ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : chr "$25,000-49,999" "$50,000-74,999" "Not displayed" "$25,000-49,999" ...
## $ IncomeVerifiable : chr "True" "True" "True" "True" ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : chr "E33A3400205839220442E84" "9E3B37071505919926B1D82" "6954337960046817851BCB2" "A0393664465886295619C51" ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : chr "2007-09-12 00:00:00" "2014-03-03 00:00:00" "2007-01-17 00:00:00" "2012-11-01 00:00:00" ...
## $ LoanOriginationQuarter : chr "Q3 2007" "Q1 2014" "Q1 2007" "Q4 2012" ...
## $ MemberKey : chr "1F3E3376408759268057EDA" "1D13370546739025387B2F4" "5F7033715035555618FA612" "9ADE356069835475068C6D2" ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
Explore borrower related variables and their characteristics.
What is the term chosen by borrowers?
##
## 12 36 60
## 1614 87778 24545
36 months seems to be the most common term chosen by borrowers.
Now we will explore Loan Originating Quarter!
## Number Of Borrowers Percentage
## 2005 22 0.02
## 2006 5906 5.18
## 2007 11460 10.06
## 2008 11552 10.14
## 2009 2047 1.80
## 2010 5652 4.96
## 2011 11228 9.85
## 2012 19553 17.16
## 2013 34345 30.14
## 2014 12172 10.68
As we can see in this table it is clear that after the dip in 2009, the number of borrowers increased drasitically.It was 1.8% in 2009 and in 2013 it is 30%.
Next we will see what range of interest rates prosper loans are offering.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
It seems that Borrower Rate ranges from 0 to 0.5. For most of the borrowers, interest rate is less than 0.25. It is also interesting to see that some borrowers have zero interest rates.
Let’s check the number of borrowers with zero interest rates.
## [1] 8
There are 8 people with zero borrower rates. But I could not understand why these people were given a special offer. May be they are of some interest for lenders because before 2009 lenders determine the interest rates, and all these loans were originated before 2009.
Now we will explore what levels of prosper ratings are available and what is the most common rating given to borrowers!
## [1] "AA" "A" "B" "C" "D" "E" "HR" ""
##
## AA A B C D E HR
## 5372 14551 15581 18345 14274 9795 6935 29084
The shape of distipution seems like a bell shaped curve and the most common prosper ratings are A,B,C, and D.
Let’s check What purpose borrowers are taking loans for?!
##
## Auto Baby&Adoption Boat
## 2572 199 85
## Business Cosmetic Procedure Debt Consolidation
## 7189 91 58308
## Engagement Ring Green Loans Home Improvements
## 217 59 7433
## Household Expenses Large Purchases Medical/Dental
## 1996 876 1522
## MotorCycle Not Available Other
## 304 16965 10494
## Personal Loan RV Student Use
## 2395 52 756
## Taxes Vacation Wedding Loans
## 885 768 771
From the graph, we can see that majority are taking loan for Debt Consolidation. The second most category is for the purpose of Business and Home Improvements.
Lets Explore the geographical distribution for borrowers.
Next mostly used states are FL, GA, IL, NY, and TX.
Exploring the range of loan amounts borrowers are requesting.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The shape of distribution is positively skewed. Minimum loan amount is 1000 and maximum is 35000. Third quartile is 12000. There is a big difference between Q3 and the max amount.
Let’s check how the graph will change when x limits are from 0 to 95%!
It seems that the majority of loans are less than 10,000.
Now We will check borrowers’ stated monthly income.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
. There seems to be an Outlier.
. I will change the x limits to see the graph closely.
. People who have less monthly income are more likely to take loans. It is also interesting to see that there are people with zero monthly income. Even though, they managed to get the loan.
. Let’s check the number of people who got loans with zero income.
## [1] 1394
. Total of 1394 people got loans with zero income. This group holds people with listing creation date after and before 2009. So there is no chance to think that thay are of some interest to lenders. It is interesting to see that all these people come under zero income or not employed. May be they have shown some property to get the loan or they are doing some other kind of job that doesn’t come in the category of monthly income.
. Next looking into the income range graph.
##
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
. Most people with the income range from 25,000-74,999 took loans.
. Let’s look into the debt to income ratio graph.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
. To get a clear graph we will take the 99 percentile.
## 50% 90% 99%
## 0.22 0.42 0.86
. Now the graph seems to be much better. Almost 99% of the Debt to income ratio is less than 0.86. This is a good number because people cannot pay all of their income for their loan payments.
. Let’s investigate the number of people which thier debt to income ratio is greater than 1!
##
## FALSE TRUE
## 104584 799
. 799 people took risk. Their debt to income ratio is greater than 1.
. Let’s look into their loans’ status.
. Most of the people were able to complete their loans. It means they are having other kind of income resources.
. As prosper is a peer-to-peer company. Now we will see how many investors are funding loans!
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
##
## 1
## 27814
. This is the graph for investors more than 1.
. Almost 27814 borrowers have only 1 investor.
. Now we will see lender yield.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
## [1] 22
. Out of 113937 loans, these are only 22 cases where lender got loss. Mean lender yield is 0.1827
This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
Prosper rating, interest rates, term, loan original amount seems to the main feature. I am planning to see how these factors are inter-related and how other factors are influencing them.
Analyzing credit score, employment status, income range, stated monthly income, loan category, and so on can help better understand main factors. .
I created two variables
A new variable named ListingCategory..string. There is a variable ListingCategory..numeric that contain numbers ranging from 0-20. For better analysis, I have created ListingCategory..string that holds the category names such as “Debt Consolidation”, “Home Improvements”, “Business”, “Personal Loan”,“Student Use”, “Auto” and so on.
Second variable is LoanOriginationYear. There is a variable named LoanOriginationQuarter. For better analysis I have combined quarters into their respective years. For example (Q1 2005,Q2 2005,Q3 2005, Q4 2005 into 2005).
Here, I setup a dataframe that contains variables that are of interest to further analyze.
. This graph shows correlation between different variables.
. Now We will see the relationship between borrower rate and prosper rating
. Borrower’s rate is highly dependent on proper rating. We can see that interest rate is increasing as prosper rating decreasing. AA is top rating and HR is lowest.
. Now We will analyze on what basis prosper rating is given!
. It seems that employment status plays a role in determining prosper rating. Employed borrowers must have a better proper rating than not employed.
. We will see how income range influence prosper rating.
. It is clear that as income range is more prosper rating is better. That’s because they are comfortable to pay their debts on time.
. We will see how credit score influence prosper rating.
. Credit score influences prosper rating. As credit score is increasing prosper rating is improving.
## ProsperRating..Alpha.: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 680.0 740.0 780.0 774.1 800.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 640.0 700.0 720.0 729.9 760.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 680.0 700.0 706.9 740.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 660.0 680.0 689.9 720.0 880.0
## --------------------------------------------------------
## ProsperRating..Alpha.: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 660.0 680.0 680.3 700.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600.0 640.0 660.0 662.5 680.0 860.0
## --------------------------------------------------------
## ProsperRating..Alpha.: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 600 660 680 677 700 860
. We can see how the mean credit score is decreasing as the proper rating is decreasing. It seems taht there is a strong relationship between these two.
. Now we will see what factors influence credit score.
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and CurrentCreditLines
## t = 46.809, df = 106330, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1361976 0.1479760
## sample estimates:
## cor
## 0.1420918
. The more credit lines, the better credit score.
##
## Pearson's product-moment correlation
##
## data: CreditScoreRangeLower and TotalInquiries
## t = -96.631, df = 112780, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2819071 -0.2711270
## sample estimates:
## cor
## -0.2765257
The lesser the inquiries, the better the credit score.
##
## Pearson's product-moment correlation
##
## data: MonthlyLoanPayment and CreditScoreRangeLower
## t = 102.99, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2871995 0.2978465
## sample estimates:
## cor
## 0.292532
. The larger the loan payment, the better the credit score.
##
## Pearson's product-moment correlation
##
## data: BorrowerRate and CreditScoreRangeLower
## t = -175.17, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4661358 -0.4569730
## sample estimates:
## cor
## -0.4615667
. Good interest rates for higher credit score. . Now we will see how monthly income, term and loan original amount are influenced by different factors!
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and MonthlyLoanPayment
## t = 67.764, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1912423 0.2024055
## sample estimates:
## cor
## 0.1968303
. People who have more income are taking higher loans.
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1956816 0.2068243
## sample estimates:
## cor
## 0.2012595
. The higher the income, the higher the loan amount taken.
##
## $0 $1-24,999 $25,000-49,999 $50,000-74,999 $75,000-99,999
## 621 7274 32192 31050 16916
## $100,000+
## 17337
. But as the income increases, number of people taking loan is decreasing. Is seems right because people with higher income will be self-sufficient and they may be do not need personal loans.
. Employed seems to get higher loan amounts.
. People are taking higher loan amounts for debt consolidation and baby&adoption.
. Now we will see for what purpose people are taking loans, when loan origination year comes into picture.
. Majority of loans are originated in years 2012-2014. It seems in earlier years people have not taken personal and Student Use loans.
. Borrowers can get higher loans when they choose to payoff in more years.
. Term has influence over borrower rate.
##
## Pearson's product-moment correlation
##
## data: LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3341283 -0.3237719
## sample estimates:
## cor
## -0.3289599
. As loan amount increases, interest rates seem to be reasonable.
investigation. How did the feature(s) of interest vary with other features in
the dataset?
Borrower rate is determined by prosper rating, credit score, loan original amount, and term. And there is a strong relationship between Borrower rate and credit score with R^2 -0.46. In turn, credit score is influenced by total inquiries, credit lines and monthly loan payments. And Loan original amount is influenced by term, employment status and listing category.
(not the main feature(s) of interest)?
There is strong relationship between borrower rate and credit score with R^2 -0.46. In turn, there is a strong relation between credit score and prosper rating.
. In this section, we will see how main factors are inter related.
. At the same level of prosper rating and credit score, higher the term implies borrowers have chance to apply for higher loan amount.
. We will see whether income influence loan amount. In bivariate analysis, we have seen that loan original amount and stated monthly income are related by R^2 of 0.2.
. Now we will see how they behave when term comes into the picture.
. Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans.
. Even if income earning are low, people have opportunity to take higher loan amounts when they choose to pay off in 5years. It seems reasonable because borrowers will have affordable monthly loan payments and their debt to income ration will be much more less than 1.
. Overall, all kinds of employment statuses can get higher loans but they have to choose higher term. But in the graph, we can definitely see that those who are employed are borrowing much more loan amount than others in each term group.
. We will see graph for loan original amount Vs income range.
. In this case also, borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more.
. In bivariate analysis, we have seen that higher loan original amount have better interest rates and they are related by R^2 of -0.33. But when term comes into picture, interest rates are a little higher.
In spite of the different levels of credit score, proper rating, employment status, and monthly income borrowers have opportunity to take higher levels of loan amounts. But they have to choose to payoff in more number of terms.
People who have more income are likely to take higher loan amount. When I further analyzed loan original amount with respect to borrower rate. People can borrower more money but when term comes into picture, interest rates are little higher.
Borrowers who have good prosper rating have an opportunity to avail lower borrower rates and at the same time, they can take higher loans. People who have lower proper rating cannot take higher loans like $30,000 and they have to pay higher borrower rates even for less loan amounts. This trend seems quite normal because lenders are taking risk of giving loans to people who have bad prosper rating. So, lenders should get some benefit of higher interest rates. It seems similar to the stock market if one takes the risk they might get huge profit or loss.
From this Boxplot it is clear that borrowers can take higher loans when they are willing to pay in more number of terms and they are earning more. And prosper is also making sure that even for people who are taking higher loan amounts have debt to income ration less than 1.
Some insights that can be drawn from this graph are.
It seems like people way of living has changed a lot since 2010. If we have much more data available to analyze then it is possible to come to a clear conclusion regarding living styles.
. The data set had nearly 114,000 loans from Nov 2005 - March 2014. After 2009 number of loans drastically increased. Prosper also changed its business model from 2009 and this might have attracted many borrowers.
. Before lenders used to determine borrower rate and now depending on credit risk prosper will fix interest rates. Many interesting insights can be drawn from this data. Initially, I was very confused by too many variables but as time progressed, I think I got some hang of these variables. It is also surprising to see that the purpose for which people are taking loans for has changed drastically over years.
. I think that a lot can be analyzed using this data like why some people are not able to pay loan on time, what is determining interest rates, what reasons are making people take loans and so on.